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ABSTRACT 

Inferring the number of planets N in an exoplanetary system from radial velocity 
(RV) data is a challenging task. Recently, it has become clear that RV data can con¬ 
tain periodic signals due to stellar activity, which can be difficult to distinguish from 
planetary signals. However, even doing the inference under a given set of simplifying 
assumptions (e.g. no stellar activity) can be difficult. It is common for the poste¬ 
rior distribution for the planet parameters, such as orbital periods, to be multimodal 
and to have other awkward features. In addition, when N is unknown, the marginal 
likelihood (or evidence) as a function of N is required. Rather than doing separate 
runs with different trial values of N , we propose an alternative approach using a 
trans-dimensional Markov Chain Monte Carlo method within Nested Sampling. The 
posterior distribution for N can be obtained with a single run. We apply the method 
to v Oph and Gliese 581, finding moderate evidence for additional signals in v Oph 
with periods of 36.11 ± 0.034 days, 75.58 ± 0.80 days, and 1709 ± 183 days; the 
posterior probability that at least one of these exists is 85%. The results also suggest 
Gliese 581 hosts many (7-15) “planets” (or other causes of other periodic signals), but 
only 4-6 have well determined periods. The analysis of both of these datasets shows 
phase transitions exist which are difficult to negotiate without Nested Sampling. 

Key words: stars: planetary systems — techniques: radial velocities — methods: 
data analysis — methods: statistical 


1 INTRODUCTION 

The number of known extrasolar planets has exploded in 
the last two decades. This has been driven by improvements 
in all of the different techniques used to detect and charac¬ 
terise exoglanets including the radial velocity (RV) method 
(e.g. ISato et al.| 120121 ). the transit method (e.g. iBatalha 
2014) , and gravitati onal microlensing (e.g. iBennett et al. 
2014 lYee et ahll2014l) . 

The problem of inferring the properties of an exo¬ 
planetary system from observational data can be challeng¬ 
ing. In the case of radial velocity data, the expected sig¬ 
nal due to an exoplanet is periodic, and the goal is to 
infer the number of planets in the system, as well as 
their properties such as orbital periods and eccentricities. 
Many different techniques have been proposed for doing 
this. These techniques fall i nto two main classes: i) thos e 
based on periodograms (e.g. IZechmeister fe Kiirsteii [20091) . 
and ii) those based on model fitting in the Bayesian infer¬ 
ence framework, to describe the uncertainties probabilisti- 
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cally (e.g- Gregory 20111; 

Feroz & Hobson 2014; Tuomi 20 111; 

Hou. Goodman, & Hogg 

20141). Bayesian model fitting via 


Markov Chain Monte Carlo (MCMC) tends to be compu¬ 
tationally intensive, especially if we want to calculate the 
posterior distribution for N, the number of planets. 


It is well known that RV datasets can contain peri¬ 
odic signals resulting from stellar activity rather than plan¬ 
ets, which can affec t the conclusions we draw about exo¬ 
planet systems (e.g. Queloz et al.ll200ll : iBonfils et al.l 120071 : 
iRobertson et alj|2014l ). Therefore. it is important to develop 
models which attempt to distinguish stellar activity signals 
from Keplerian planet signals based on the shape of the os¬ 
cillations and/or additional data constraining the periods 
of any stellar activity signals. We do not address this im¬ 
portant challenge in the present paper. Rather, we consider 
the problem of inferring the number N of Keplerian signals 
in an RV dataset in a computationally efficient way, under 
the simplifying assumption that only Keplerian signals are 
present in the data. 


We i ntroduce a tran s-dimensional birth-death MCMC 
approach dStephensl2000l ) to inferring N. When N is treated 
as just another model parameter, we can obtain its poste- 
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rior distribution in a single run. In addition, rather than 
trying to sample the post erior dist ribution, we use Diffu¬ 
sive Nested Sampling (DNS lBrewer. Partav. fc CsanvfeOlll 'l 
which replaces the posterior distribution with an alterna¬ 
tive mixture of constrained priors, allowing mixing between 
separated modes. As a result, we are able to sample the 
posterior distribution for N , and evaluate the marginal like¬ 
lihood (including the sum over N) in a single run which 
takes about 10 minutes on a 2-3 planet system. On the 
other hand the approach of [Gregory] d201lf) takes approx¬ 
imately 30 minutes per planet (Gregory, priv. comm.). A 
C-|—b implementation of our method is available online at 
https://github.com/eggplantbren/Exoplanet under the 
terms of the GNU General Public Licence. 


2 INFERENCE 

Bayesian inference is the use of probabilit y the¬ 
ory to describe u ncert ainty dSivia fe Skillinel 120061 : 
lO ’Hagan and Forsteij l2004l h In this framework, we ap¬ 
proach data analysis problems by first constructing a 
hypothesis space, which is the set of possible answers to 
the problem we are considering. Normally, this is the set 
of possible values of a vector of parameters 9 whose values 
we want to know. We then assign probability distributions 
called the prior and the sampling distribution. The prior 
distribution p(9) describes our initial uncertainty about 
which values of the parameters 9 are plausible, and the sam¬ 
pling distribution p(D\9) describes our initial uncertainty 
about the data set we’re going to observe, as a function of 
the unknown parameters 9. When the data is known, our 
state of knowledge about the parameter is updated from 
the prior p{9) to the posterior distribution given by Bayes’ 
rule: 

mm - r ZMm (i) 

where p(D\9) as a function of 9 is called the likelihood, once 
the actual dataset has been substituted in. Note that some 
authors do not distinguish between a sampling distribution 
and a likelihood. Throughout this paper we use the term 
sampling distribution for p(D\9) if we are discussing a prob¬ 
ability distribution (actually a family of them, indexed by 9) 
over the set of possible datasets. We use the term likelihood 
when the actual dataset has been plugged in, when p(D\9) 
becomes a scalar function (not a probability distribution) 
over the parameter space. 

The denominator, often called the evidence or marginal 
likelihood, is given by the expected value of the likelihood 
with respect to the prior: 

Z=p(D) = I p(9)p(D\9)d n 9 (2) 

where the integral is over the entire n-dimensional parameter 
space. In the context of Bayesian computation, the prior is 
often denoted n(9), the likelihood L(9), and the marginal 
likelihood Z. 


2.1 Inferring the number of planets 

The number of orbiting planets, N, is an important param¬ 
eter. To calculate the posterior distribution for N, most au¬ 


thors consider various trial values of N, and calculate the 
marginal likelihood 


p(D\N) = J p(9\N)p(D\9,N)d n 9 

for each possible val u e of N (e.g. Gregory! 
Feroz. Balan. fe Hobsonl 1201 ll : iFeroz fe Hobsonl 


(3) 


2011 


2014 


Hou. Goodman, fe Hogg||2014 1. marginalising over all other 


model parameters. The posterior distribution for N can 
then be found straightforwardly by using Bayes’ rule with 
N as the only unknown parameter: 


p(N\D) 


p(N)p(D\N) 

j: nP (n) p (d\n)- 


(4) 


Popular methods for c alcu lating the marginal likelihood are 
Nested Sampling (^killing 2006) a nd id eas related to ther¬ 
modynamic integration (e.g. Nealll200lll. Relationships be¬ 
tween thes e methods are discuss ed by iGameron fc Pettittl 
(1 201 4) and I Poison fe Scottl (12014 1. 


This traditional approach can be very time consum¬ 
ing. Methods for calculating the marginal likelihood are al¬ 
ready more intensive than standard MCMC methods for 
sampling the posterior, because they usually involve a se¬ 
quence of probability distributions (e.g. the constrained pri¬ 
ors in Nested Sampling, or the annealed distributions in 
thermodynamic integration) rather than a single distribu¬ 
tion (the posterior). This intensive process needs to be run 
many times, for N = 0, N = 1, N = 2, and so on. 

The traditional approach to inferring N also contradicts 
fundamental ideas in Bayesian computation. Imagine we are 
trying to compute the posterior distribution for a parameter 
a in the presence of a nuisance parameter b. This is usually 
solved by exploring the joint posterior for a and b, and then 
only looking at the generated values of a. Nobody would 
suggest the wasteful alternative of using a discrete grid of 
possible a values and doing an entire Nested Sampling run 
for each, to get the marginal likelihood as a function of a. 
When the hypothesis space for a is discrete, MCMC is still 
possible and there is no reason to switch to the wasteful 
alternative. 


2.2 Trans-dimensional MCMC 


Trans-di mensio nal MCM C methods such as birth-death 
MCMC dstephensl l2000ll or the more general reversible 
jump MCMC ( Greenl 19951 ') treat the model dimension 
N as just another model parameter. At fixed N, stan¬ 
dard techniques such as the Metropolis algorithm can 
be used to explore the posterior distribution. Addi¬ 
tional moves that propose to change the value of N 
are also defined. The simplest of these are birth-death 
moves. More complicated moves, such as split-and-merge, 
are possible but not always necessary. Trans-dimensional 
MCMC is a natural tool for a wi de range of astronom¬ 
ical data analys i s problems (e.g. Umstatte£ i et_alJ_ 2005|: 
Walmswell_gt_al. _ gOljl^jBrewer jForgman-Mackev. fe Hoed 
20131 : Ijones. Kashvap. fe van Dvkll2()l ll ) . 

In the exoplanet context, a birth move proposes to add 
one more planet to the model. The new planet’s properties 
(period, amplitude, eccentricity, etc) are drawn from their 
prior distribution which may depend on other other model 
parameters or hyperparameters. The corresponding death 
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move simply chooses a planet currently in the model, and 
removes it. The acceptance probability for these moves is 1 
if we want to expl ore the prior. T o implement these moves in 
Nested Sampling (ISkillinell2006ll . where the target distribu¬ 
tion is proportional to the prior but with a hard likelihood 
constraint, then the acceptance probability is 1 if the pro¬ 
posed move satisfies the likelihood constraint, and 0 if it 
does not. 

A recent paper (|Brewe 3 Uni) introduced a general 
approach to implementing trans-diin ensio nal models within 
Diffu sive Nested Sampling llBrewer. Partav. fe Csanvil 
Em]), a general MCMC algorithm. The iBrewe r J2014h 
software predefines the Metropolis proposals for exploring 
trans-dimensional target distributions, including when the 
prior for the properties of each model component (i.e. each 
planet) is defined hierarchically. 

2.3 Phase transitions 

It is well known that Bayesian computation (using MCMC 
for example) can be difficult when the posterior distribution 
is multimodal or has strong dependencies between param¬ 
eters. An uncommon but less we ll-kn own difficulty is the 
existence of phase transitions (ISkillinell2006h . 

Imagine a high-dimensional unimodal posterior distri¬ 
bution that is composed of a broad, high volume but low 
density “slab” with a narrow, low volume but high density 
“spike” on top of it. An example is a mixture of two concen¬ 
tric high-dimensional gaussians with different widths. If you 
ran MCMC on such a posterior, it would be difficult to jump 
between the slab and the spike components. If the MCMC 
is currently in the spike region (or phase) it will be unable 
to escape: a proposed move into the slab will be rejected 
because of the ratio of densities. Conversely, if the MCMC 
was in the slab region, it would be unlikely to go into the 
spike region, because its volume is so small: it would be very 
unlikely to propose to move into the spike. Thus, the situa¬ 
tion behaves much like a multimodal posterior, despite only 
being unimodal. 

If the slab contains a very small amount of posterior 
probability, it is not a problem if an MCMC algorithm 
spends all its time in the spike. However, this situation could 
still cause problems with the calculation of the marginal like¬ 
lihood if annealing methods are used. The thermodynamic 
integral formula gives the log of the marginal likelihood Z 
as an average of log likelihoods: 

log (Z) = [ (log [ L ( d )})p d P (5) 

Jo 

where the expectation is taken with respect to the dis¬ 
tribution with “inverse temperature” /?, proportional to 
n(6)L(6) 13 . Even if the slab contains virtually zero proba¬ 
bility when fi — 1 (i.e. the posterior), for some values of 
the inverse temperature fi the slab and the spike will both 
be important. At these temperatures the MCMC will fail to 
mix (it will incorrectly spend all its time in either the slab or 
the spike, rather than mixing between the two) and will give 
a misleading estimate of the average log likelihood at that 
temperature and therefore an incorrect marginal likelihood 
estimate. 

Phase transitions are well known in statistical mechan¬ 
ics, but can also occur in Bayesian data analysis. Typ¬ 


ically this occors when the data contains a “big” effect 
which provides a lot of information about some parame¬ 
ters, and a “small” subtle effect as well. Nested Sampling, 
and variants such as DNS, are not affected by phase tran¬ 
sitions because the exploration only makes use of likeli¬ 
hood rankings , rather than likelihood values themselves, and 
are therefore invariant under monotonic transformations of 
the likelihood function. Part of their output, the relation¬ 
ship between the likelihood L and the enclosed prior mass 
X(L) = f 7t((9)1 [L(9) > L] d n 6 can be used to diagnose 
whether the problem contains a phase transition. In par¬ 
ticular, if the graph of log(L) vs. log (A') is convex at some 
point, then a phase transition exists (iskillinelbood'l . 

2.4 Parameters and Priors 

To fit a planet model to RV data, we need parameters to 
describe the properties of each planet. For simplicity, we de¬ 
scribe each planet by five parameters: the orbital period Pi, 
the semi-amplitude (in metres per second) of the RV signal 
Ai, the phase of the signal (j>i (defined such that (j> = 0 gives 
an RV signal whose maximum is at t = 0), the eccentricity 
d, and the “viewing angle” uii (also known as the longitude 
of the line of sight). We defined our parameters such that 
in the limit of zero eccentricity, the RV signal of a planet 
reduces to Ai cos (2nt/Pi + (j>i). 

The unknown parameters are: 

|JV, a,{l/’}dl,m 0 ,Oextra,Z'j (6) 

where N is the number of planets, a = {pp,ap, pa} are 
hyperparameters hyperparameters used to define the prior 
for the properties of the planets, and ij)i = {Pi,Ai, a, 4>i,Wi} 
are the properties of planet i. The parameter mo describes 
a DC offset in the data, and <7 e xtra and v are parameters 
of the noise distribution which are discussed further below. 
Note that our parameter u>i is standard, however (f>i is non¬ 
standard because we assert that (j>i = 0 always implies the 
signal is at its maximum at t = 0. Our parameter space is 
equivalent to the standard one, we are just using a different 
coordinate system. 

A standard assumption for the probability distribution 
of the data given the parameters (known as the sampling 
distribution, which becomes the likelihood function when 
the dataset is known) is a normal distribution with standard 
deviation ai known from the error bars in the data set. How¬ 
ever, it is usually recommended to put in “safety features”, 
in case the data set contains any discrepant measurements, 
or in case the error bars in the data set are underestimated. 
To achieve this, we used a student-t distribution instead of 
a normal distribution, with scale parameter af + Uextra 
and shape parameter v. The parameter <T ex t ra is an “extra 
noise” parameter that effectively increases the size of the er¬ 
ror bars, and the shape parameter v allows for heavier tails 
than a normal distribution. If v is large, the student-t dis¬ 
tribution is approximately a normal distribution, and if v 
is small the noise distribution has much heavier tails. For 
instance, when v = 1 the student-t distribution becomes a 
Cauchy distribution. 

All of the model assumptions are specified in detail 
in Tabic [l] We assigned hierarchical priors to some of the 
planet’s parameters (i.e. the prior for the planets’ parame- 
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ters is defined conditional on some hyperparameters). This 
allows the model to capture the idea that knowing the val¬ 
ues of some planet’s parameters provides some information 
about the parameters of another planet. Not using hierar¬ 
chical priors usually implies a strong prior commitment to 
the hypothesis that the properties of the planets are spread 
out across the whole domain of possible values, which is not 
necessarily the case. 

Most of our priors were chosen to represent vague prior 
knowledge, rather than the judgement of an informed ex¬ 
pert on extrasolar planets. Uniform distributions were used 
for parameters such as phases, where time-translation sym¬ 
metry seems plausible. For some parameters we assigned the 
distribution in terms of the log of the parameter, rather than 
the parameter itself, when the parameter is positive and un¬ 
certain by orders of magnitude. Truncated Cauchy distribu¬ 
tions were used when there is a preferred value, but since 
these have very heavy tails, the assumption is quite fail safe 
relative to other possible “informative” assignments such as 
normal distributions. For example, the prior for yp, the typ¬ 
ical orbital period, is centered around 1 year but could be 
as low as « e -21 years or as high as ~ e 21 years, a very 
generous range. A uniform distribution for log (yp) would 
have been more conventional, whereas the Cauchy distribu¬ 
tion expresses a slight preference for fip being of order one 
year. 

An apparently strange choice is the conditional prior 
for the (logarithms of) the orbital periods, which is a biex¬ 
ponential distribution given a location parameter yp and a 
scale parameter wp which determines the width of the dis- 
tributiorQ. Rather than assigning independent priors to the 
log periods, the hierarchical model allows for the periods to 
“cluster around” a typical period ^ip if there is evidence for 
this. On the other hand, independent priors for the periods 
would imply a strong prior commitment to the hypothesis 
that the periods are spread out across the whole prior vol¬ 
ume (equivalent to assuming a fixed large value for wp). 
A more conventional choice for the conditional prior given 
yp and wp would have been a normal distribution. However, 
the |jB rewerl (120141 ) software needs to know the corresponding 
cumulative distribution and its inverse, which are not avail¬ 
able in closed form for the normal distribution. Our prior 
for wp, which controls the diversity of the log-periods, was 
uniform between 0.1 and 3, since it is unlikely that many 
planets have extremely similar or extremely different (over 
several orders of magnitude) orbital periods. 

For the velocity semi-amplitudes {Ai}, we chose an ex¬ 
ponential distribution given the hyperparameter yA which 
sets the mean of the exponential distribution. Our prior for 
yA spans many orders of magnitude but expresses a slight 
preference for /.la being of order unity, using a Cauchy dis¬ 
tribution. The prior for the semi-amplitudes will influence 
how many of these low amplitude planets will be inferred: if 
we believe there are many, and the data are uninformative 
about low amplitude planets, then the posterior distribution 
for N will also indicate that there may be many low ampli¬ 
tude planets. However, their other properties, such as their 


1 A biexponential distribution with location parameter y and 
scale parameter w has probability density function p(x\y,w) = 


2w 


exp I- 


|x-m| ' 


orbital periods, will not be well determined. The Beta prior 
for eccentricity was suggested by Gregory (priv. comm) and 
is an approximation to the inferre d frequency d istribution 
of eccentricities in the population (Kipping 12013 ). 


3 ORBIT LOOK-UP TABLE 


The expected (noise-free) signal due to an exoplanet is peri¬ 
odic, but non-sinusoidal when the orbit is not perfectly cir¬ 
cular. The expected shape m(t) of the variations is needed in 
order to evaluate the likelihood function for any proposed 
setting of the parameters. To save time, we pre-computed 
the properties of orbits as a function of eccentricity. We also 
made the standard assumption that the planets do not in¬ 
teract, so the expected signal due to several planets is the 
sum of the contributions of each planet. 

Consider a test particle moving in the x-y plane under 
the influence of a point mass at the origin. The motion of 
the test particle represents the reflex motion of the host 
star orbiting around the center of mass of the system. The 
equations of motion for the particle are: 


d 2 x x 

dt 2 r 3 

d 2 y _ _y_ 

dt 2 r 3 


(7) 

( 8 ) 
(9) 


where r = y/x 2 + y 2 . The solutions to this system of equa¬ 
tions are elliptical orbits with the focus at the origin. We 
set the initial position to (1,0), and the initial velocity to 
(0, v) where v £ [0.4,1]. If v = 1, the orbit is circular 
and as v decreases the orbit becomes more elliptical. For 
trial values of v ranging from 0.4 to 1 in steps of 0.005, 
we calculated the orbit, and saved the velocities ^ and 
as a function of time to disk. These saved orbits were 
used as a lookup table for constructing the expected sig¬ 
nal y(t) due to a single planet. Because of the initial con¬ 
ditions, the simulated orbits were all horizontally aligned. 
If the observer is located on the x-axis a large distance 
from the origin, they will measure m(t) = x(t). However, 
if the observer is located at an angle co with respect to 
the x-axis, then the radial velocity measured will instead 
be m{t) = cos(u;)x(t) + sin(u;)y(t). Since the our orientation 
with respect to the orbits is unknown, each planet requires a 
“viewing angle” parameter w also k no wn as the longitude of 
the line of sight (johta, Taruva, fe Sutoi2005l ). The eccentric¬ 
ity of the orbit, in terms of v, is e = 1 — v 2 . By precomputing 
a set of orbits before running the MCMC, we are able to do 
~ 15,000 likelihood evaluations per second per CPU core. 


4 DEMONSTRATION ON SIMULATED DATA 

To test our proposed methodology, we generated a simu¬ 
lated dataset for a system with N = 7 planets. The dataset 
was “inspired by” the v Oph dataset (Section 0 , and con¬ 
tains two large signals with periods of 530 and 3120 days, 
whose semi-amplitudes are 291 m s -1 and 181 m s -1 re¬ 
spectively. The other five planets have much lower semi¬ 
amplitudes, ranging from 4-30 ms -1 . The standard devia¬ 
tion for the noise in the data was 5 m s , so some of these 


© 0000 RAS, MNRAS 000, 000-000 












5 


Quantity 

Meaning 

Prior 

Hyperparameters 

N 

HP 

Wp 

HA 

Number of planets 

Median orbital period (years) 

Diversity of orbital periods 

Mean amplitude (metres per second) 

Uniform({0,1,AT max }) 

log(//p) ~ Cauchy(location = 5.9, scale = 1)T(—15.3, 27.1) 

wp ~ Uniform (min = 0.1, max = 3) 

log(HA) ~ Cauchy (location = 0, scale = 1)T(—21.2, 21.2) 

Planet Parameters 

Pi 

Ai 

ei 

UJi 

Orbital period 

Semi-amplitude of signal 

Phase of signal 

Eccentricity of orbit 

Viewing angle 

log(Pi) ~ Biexponential(location = log(/^p), scale = wp) 
Exponential(mean = ha) 

Uniform(min = 0, max = 2n) 

Beta(n = 1,(3 = 3.1)T(0, 0.8) 

Uniform(min = 0, max = 2n) 

Other 

mo 

0”extra 

V 

Constant DC offset 
“Extra noise” parameter 

Shape parameter for ^-distribution for noise 

Uniform(min = mo,min, max = mo,max) 

log(cr) ~ Cauchy(location = 0, scale = 1)T(—21.2, 21.2) 

log(^) Uniform(min = log(0.01), max = log(1000)) 

Data 



Yi 

Radial velocity measurements 

Student-t ^location = m(U), scale = yj + ^extra’ shape = v'j 


Table 1. All of the prior distributions in our Bayesian model. The priors for the planet parameters are defined conditional on the values 
of the hyperparameters. Uniform priors were used for parameters like phases. For parameters where a rough initial guess is possible, 
heavy-tailed Cauchy distributions were used so this information could be taken into account in a non-dogmatic way. Time units are in 
days and amplitude units are in metres per second. The maximum number of planets, ;V m ^ x , was set to 10. The prior limits for mo were 
set to the minimum and maximum value in the dataset, which is not strictly a valid strategy. Some of the distributions were truncated 
for numerical reasons, the truncated intervals are specified with the T(a, b) notation. 



Figure 1. A simulated dataset that “resembles” the v Oph 
dataset, which we used to test our methodology. The dominant 
signal is from two large planets with periods periods of 530 and 
3120 days and semi-amplitudes of 291 m s" 1 and 181 m s" 1 
respectively. There are also five much smaller planets which con¬ 
tribute small additional effects to the data. 

low-amplitude signals should be detectable. The simulated 
data is shown in Figure[l] along with the true radial velocity 
curve m(t) that was used to generate the data. 

We ran our algorithm on the simulated dataset to ob¬ 
tain samples from the posterior distribution. We obtained 
520 posterior samples. The posterior distribution for N, the 
number of planets, is shown in Figure [2] The true number of 
planets, 7, is not the most probable value, but it does have 
substantial probability. The posterior distribution suggests 
that N could be anywhere from 6 to 10. 

The posterior distribution for the periods {Pi}fLi is 
shown in Figure [3] Because of the label switching degen¬ 



Figure 2. The posterior distribution for the number of planets 
N given the simulated dataset. The true number of planets was 
7. 


eracy, the posterior distribution for each period is identical, 
so we pooled the samples for all periods. Defining the log- 
periods by Si = log 10 [Pi/{ 1 day)], Figure[3]is a Monte Carlo 
representation of the mixture distribution 

10 N 

m = 5>(ATp) Epos'* \n,d). (io) 

N=0 i =1 

If a certain period is accurately measured (i.e. it appears in 
close to 100% of the posterior samples and the distribution 
for its period is very narrow) then it will appear in Figure 
3 with a height of ~ 520. If the uncertainty in the period is 
larger than the histogram bin width then the peak will be 
spread over several bins. 

The posterior distribution for the periods, shown in Fig- 
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Figure 3. The posterior distribution for the periods given the 
simulated dataset. The solid lines are the true periods, six of 
which were detected with high posterior probability. 



1.5 2.0 2.5 3.0 3.5 



bg ll( iT(:7'iod/d;Lys) 


Figure 4. The joint posterior distribution for the periods and the 
amplitudes (top panel) and the periods and eccentricities (bottom 
panel) from the simulated dataset. The true values are plotted as 
circles. 


ure[3] shows that six of the true periods were recovered, with 
probability close to 1. One period (with log 10 Pi x 1.15) 
which was actually present was not “detected” because it 
had a very small amplitude. We note that the posterior prob¬ 
ability near this period should not be precisely zero. There is 
also some evidence for periods which did not actually exist, 
however the posterior probabilities for these peaks are not 
close to 1. 

The joint posterior distribution for the periods and the 
amplitudes of the signals is shown in Figure [4] along with the 
eccentricities. As with Figure [3j the samples for all planets 
were combined. The true values are also plotted as circles. 
Clearly, the reason the period of log 10 (Pi) x 1.15 was not 
“detected” was that it had a very low amplitude of approx¬ 
imately 4 m s _1 which is below the noise level. 


5 APPLICATION TO v OPH 


The v Oph system is generally accepted to have two 
confi r med planets (e.g. iQuirrenbach, Reffert, fc Berg mannl 
l201ll ; ISato et ahll2012l : Hou. Goodman. fe Hogg|l2014l) . with 



Number of Planets 


Figure 5. The posterior distribution for the number of planets 
N orbiting v Oph. The posterior probability of N > 2 is about 
88%, however the prior probability of N > 2 was already high 
due to the uniform prior for N. 


periods of 530.3 and 3190 da ys. To test our ap proach we 
applied it to the RV data from iSato et alj (120121) . The pos¬ 
terior distribution for N, the number of planets, is shown 
in Figure [5j showing that N could be anywhere from 2 to 
10, and the posterior probability that N ^ 3 is about 88%. 
Of course, these extra possible signals are not necessarily a 
planet but a feature in the data which is better explained 
by a periodic signal than by noise (and may have been ex¬ 
plained by correlated noise, had we included it). 

The posterior distribution for the logarithms of the pe¬ 
riods is shown in Figure [6] As in Section [4] the posterior 
samples for all periods were combined to make this figure, 
which shows several prominent peaks. The two peaks with 
vertical dashed lines are the commonly accepted periods of 
530 and 3190 days, and the other prominent peaks (i.e. sig¬ 
nals which have a moderate probability of existence) have 
periods of 36.11 ± 0.034 days, 75.58 ± 0.80 days, and 1709 
± 183 days. As with any MCMC output, if we are inter¬ 
ested in the probability of any proposition Q (for example, 
“Q = a planet exists with period between 35 and 37 days”), 
we can calculate the proportion of the posterior samples for 
which Q is true, which (if we have a lot of samples) is a 
Monte Carlo estimate of the posterior probability of Q. For 
v Oph, we calculated the probability that at least one of 
these “extra” signals (beyond the two commonly accepted 
ones) exists, as 85%. Given that they exist, their amplitudes 
are low, around 5-40 metres per second, which we note is 
above the noise level. An example model fit to the data is 
shown in Figure [3 

Figure [8] shows the relationship between the likelihood 
L and the enclosed prior mass X for the v Oph analysis. 
These plots are a standard output of Nested Sampling anal¬ 
yses, and provide insights into the structure of the problem. 
Concave-up regions of this curve indicate phase transitions 
which can cause severe problems for annealing-based meth¬ 
ods, and sometimes even for sampling the posterior distribu¬ 
tion. In this analysis, the models with a third signal exist to 
the left of the phase transition at log(X) x —70, and models 
without the signal exist to the right of the phase transition. 
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Figure 6. The posterior distribution for the orbital periods in 
the v Oph system. The two dashed vertical lines are the com¬ 
monly accepted periods of 530.3 and 3190 days. The next most 
prominent peaks with well determined periods are at log periods 
of around 1.6, 2.4, and 3.3, corresponding to periods of 36.11 =L 
0.034 days, 75.58 =b 0.80 days, and 1709 ± 183 days. 


Figure 8. Log likelihood vs. enclosed prior mass (a standard 
output of Nested Sampling algorithms) for the v Oph analysis. 
There are several phase transitions (concave-up regions) present. 
The small one at log(X) « —70 separates models which contain 
additional signals from models that do not. Without using Nested 
Sampling it would be more difficult to mix between these two 
situations and calculate the posterior probability for the existence 
of the additional signals. 



Time (days) 

Figure 7. The v Oph radial velocity data and an example model 
fit which includes a third period. The amplitude of this additional 
signal is low but is about twice the reported errorbars on the 
measurements. 


Mixing between these two phases is crucial for accurately 
computing the posterior probability that extra signals exist. 

The marginal likelihood for our model was log (Z) « 
—220.5. Nested Sampling also allows for calculation of the 
“information”, or Kullback-Leibler divergence (a quantity in 
information theory) from the prior to the posterior, which 
quantifies how much we learned about the parameters: 


n 


J p(0\D) log 


p(e\D) 

p{e) 


dd 


(ii) 


An intuitive interpretation of this quantity is the number of 
times the prior distribution had to be compressed by a factor 
of e (if the logarithm in the formula is a natural logarithm) 


to get to the posterior distribution. For the v Oph data the 
information was H « 76.6 nats (natural units) or 111 bits, so 
the posterior occupies about e -76 ' 6 times the prior volume. 


6 APPLICATION TO GLIESE 581 

The red dwarf star Gliese 581 is thought to host several plan¬ 
ets. Exactly how many is a m a tter o f considerable debate. 
According to iRobertson et all (I20l4) . there are two plan¬ 
ets (b and c, with periods 5.36 and 12.91 days respectively) 
whose existence is generally accepted, two more (d and e, 
with periods 66 and 3.15 days respectively) whose existence 
was mostly accepted, and another two (f and g, with periods 
433 and 36.5 days resp ectively) whose existenc e was gener¬ 
ally doubted. However, IRobertson et all (120141) found that 
planets d and g do not exist but are signals due to stellar 
activity. While our model cannot account for stellar vari¬ 
ability and contribute to that particular discussion, it is a 
challenging and interesting dataset from an inference point 
of view. To run our code on the combined dataset from the 
HARPS and HIRES spectrographs, we extended the model 
to include separate DC offsets for each instrument, as well 
as separate “extra noise” parameters so and v. We also in¬ 
creased Amax from 10 to 15 for this system. 

The posterior distribution for N is shown in Figure [9] 
and shows strong evidence for at l east eight per iods (P(N ^ 
7| D) = 0.012). Some authors (e.g. lTuoinill201 J) recommend 
that the probability of N planets should be ^ 150 times 
greater than the probability of N — 1 planets existing be¬ 
fore making a claim that N planets have been definitively 
detected. Such a decision rule is presumably equivalent to 
a utility function where false positives are much worse than 
false negatives. We note that applying this rule to our re¬ 
sults, we would assert N = 6, even though this has a very 
small posterior probability. 
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Figure 9. The posterior distribution for the number of planets 
N orbiting Gliese 581. 

It is now recognised that there are many possible 
sources for oscillations in a data set and not all such os¬ 
cillations should be claimed as planets. Our model cannot 
distinguish between oscillations due to planets and oscilla¬ 
tions due to stellar activity: any oscillations found in the 
dataset will be described as “planets” by the model. An¬ 
other important consideration is the physical stability of 
th e orbital syst em which is ignored by this type of analysis 
l|T6th fc Nagy|[2014l 'l. However it is interesting that we find 
many more signals in the data than previous authors. By 
inspecting the posterior distribution for the periods (Fig¬ 
ure m, we see that only 4 of the periods are well deter¬ 
mined and have a posterior probability close to 1 (i.e. they 
are present in all samples), corresponding to the known pe¬ 
riods of Gliese 581 b, c, d, and e. The other “periods” are 
more uncertain. As with v Oph, we can calculate the poste¬ 
rior probability of any hypothesis about Gliese 581 by com¬ 
puting the fraction of the posterior samples that have that 
property. The posterior probabilities for planets b, c, d, and 
e, are close to 1. The posterior probability a signal exists 
with log 10 (P) G [1.55, 1.57] is 88%, and the probability for 
a signal with log 10 (P) G [2.6, 2.8] is 85%. 

One possible explanation for the large number of in¬ 
ferred signals that a non-sinusoidal signal due to stellar 
activity is being modelled as several periods (e.g. as hap¬ 
pen^ iii_asteroseismology when sinusoidal models are used; 
iBrewer et al.l 120071 ). and if the mod el were extended to in - 
clude a “stochastic” oscillation (e.g. iBrewer fc Stelloll2009li . 
the number of periods detected may be reduced substan¬ 
tially. Another contributing factor is the prior for the ampli¬ 
tudes. With these kinds of models, the posterior distribution 
for N can be influenced by the prior for the amplitudes Ai. 
Many authors assign independent broad priors to the am¬ 
plitudes, and this causes the “Occam’s razor” penalty for 
adding extra signals to be quite strong. Since we use a hi¬ 
erarchical prior for the amplitudes, if some amplitudes are 
found to be low, /ta will become small. When /xa is small, 
it is likely that any extra signals will have small amplitudes, 
so the “Occam’s razor” effect is weaker. An example model 
fit for Gliese 581 is shown in Figure [lT] 

The marginal likelihood was log (Z) ~ —616.3 and the 



logio [Period (days)] 

Figure 10. The posterior distribution for the orbital periods in 
the Gliese 581 system. The solid lines are Gliese 581 b and c, the 
dashed lines are “planets” d and e, and the dotted lines are f and 
g- 
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Figure 11. A portion of the Gliese 581 data and an example 
model fit. 

information was H « 130.4 nats. This compares favourably 
to the m arginal like l ihood of — 640.1 (for a 6-planet model) 
found bv lHou. Goodman, fc Hogg] (l2014f) . although it is un¬ 
clear whether we used exactly the same dataset. Interest¬ 
ingly, the log-likelihood curve (Figure [T2l) shows this prob¬ 
lem has two phase transitions. While these do not affect the 
posterior distribution (as they did for v Oph), they would 
cause difficulties if we tried to calculte the marginal likeli¬ 
hood using annealing. 


7 CONCLUSIONS 

In this paper we introduced a trans-dimensional MCMC ap¬ 
proach to inferring the number of planets N in an exoplane¬ 
tary system from radial velocity d ata. T he MCM C was im¬ 
plemented using the framework of iBrewe 3 (l2014|j which de¬ 
fines trans-dimensional birth and death moves, and does the 
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Figure 12. Log likelihood vs. enclosed prior mass for the Gliese 
581 analysis. The two concave-up regions (at log(X) fa —25 and 
-45) correspond to phase transitions. Thermal approaches to this 
problem would produce misleading estimates of the marginal like¬ 
lihood because they would mix poorly at temperatures around 11 
and 4. 

sampling with respect to a Nested Sampling target distribu¬ 
tion, rather than directly sampling the posterior. This ap¬ 
proach allows us to compute the results in a single run, which 
provides posterior samples and an estimate of the marginal 
likelihood. By using Diffusive Nested Sampling, instead of 
directly trying to sample the posterior distribution, we can 
overcome difficult features in the problem, such as phase 
transitions and (to some extent) multiple modes. 

We applied the code to two well-studied RV datasets, 
v Oph and Gliese 581. In v Oph, we found some evidence 
for additional signals with low amplitude, but with several 
possible solutions for their periods. Given our modelling as¬ 
sumptions, the posterior probability at least one of these 
additional signals is real is 85%. The posterior distribution 
contains models both with and without these additional sig¬ 
nals, however, these are separated by a phase transition. 
Therefore mixing between the two situations would be infre¬ 
quent if we simply tried to sample the posterior distribution. 

With the combined HIRES+HARPS dataset from 
Gliese 581, we found evidence for a large number of “plan¬ 
ets”, although only four have well determined periods, cor¬ 
responding to the Gliese 581 b, c, d, and e. Since our model 
does not include any possibility of stellar variability, any 
such periodic signals will be attributed to “planets”. Includ¬ 
ing non-planetary stellar variability is a crucial next step. 
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